1 Cluster analysis
A cluster analysis for the regimes of all piezometers in the Loire catchment was conducted. The data was (so far) not checked. The monthly mean was calculated, then a z-score normalisation was executed and the longtime monthly mean calculated. Stations which inherited NAs afterwards were excluded for clustering.
1.1 Hierarchical clustering
1.2 K-means clustering
2 Cluster analysis - filtered data
In the ADES dataset there are only NAs for the variable rel_depth. It seems that the variable height is interpolated. For now the stations were filtered for years with more than 90% available data of rel_depth, a start time of the year 2000 and at least 18 years length. Still the parameter height is used for normalization and clustering, with the assumption that the interpolation is proper.
2.1 K-means clustering
A total of 6 clusters were chosen to see if its possible to distinguish better between influenced and not influenced piezometers.
2.2 Random Forest clustering
A Random Forest (RF) model was trained unsupervised with 5000 trees. The clustering was done using Partitioning Around Medoids (PAM) clustering for the proximity matrix of the RF output.
rf <- randomForest(x = df_m_wide2, ntree = 5000, proximity = TRUE)
prox <- rf$proximity
pam.rf <- pam(prox, 6)